Towards Domain Independent Why Text Segment Classification Based on Bag of Function Words

نویسندگان

  • Katsuyuki Tanaka
  • Tetsuya Takiguchi
  • Yasuo Ariki
چکیده

Increased attention has been focused on question answering (QA) technology as next generation search since it improves the usability of information acquisition from web. However, not much research has been conducted on “non-factoid-QA”, especially on Why Question Answering (Why-QA). In this paper, we introduce a machine learning approach to automatically construct a classifier with function words as features to perform Why Text Segments Classification (WTS classification) by using SVM. It is a process of detecting text segments describing “reasons-causes” and is a subtask of Why-QA mainly related to an answer extraction part. We argue that function words are a strong discriminator for WTS classification. Furthermore, since function words appear in almost all text segments regardless of the domain of the topic, it also enables construction of a domain independent classifier. The experimental results showed significant improvement over state-of-the-art results in terms of accuracy of WTS classification as well as domain independent capability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Palarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm

Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...

متن کامل

Selecting Text Features for Gene Name Classification: from Documents to Terms

In this paper we discuss the performance of a text-based classification approach by comparing different types of features. We consider the automatic classification of gene names from the molecular biology literature, by using a support-vector machine method. Classification features range from words, lemmas and stems, to automatically extracted terms. Also, simple co-occurrences of genes within ...

متن کامل

Why Text Segment Classification Based on Part of Speech Feature Selection

The aim of our research is to develop a scalable automatic why question answering system for English based on supervised method that uses part of speech analysis. The prior approach consisted in building a why-classifier using function words. This paper investigates the performance of combining supervised data mining methods with various feature selection strategies in order to obtain a more ac...

متن کامل

Using Graph-Kernels to Represent Semantic Information in Text Classification

Most text classification systems use bag-of-words representation of documents to find the classification target function. Linguistic structures such as morphology, syntax and semantic are completely neglected in the learning process. This paper proposes a new document representation that, while including its context independent sentence meaning, is able to be used by a structured kernel functio...

متن کامل

CSCR010: Second Year Report

The aim of my PhD research is focused on Text Mining, one major research school in Knowledge Discovery in Databases (KDD), and in particular Text Preprocessing (TPP) for classification / categorization of documents utilizing novel algorithms for the identification of hidden patterns, rules, regularities and trends within these documents. Significant techniques in Data Mining, another wellknown ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012